308        Bioinformatics

-F 256 sam/ERR1823601.bam \

> sam/ERR1823601_unmapped.bam

samtools view \

-b -f 12 \

-F 256 sam/ERR1823608.bam \

> sam/ERR1823608_unmapped.bam

The “-f 12” option is used to extract only the unmapped forward and reverse reads and “-F

256” option is used to exclude secondary alignments. Refer to Chapter 2 for FLAG field of

the SAM/BAM file.

The above Samtools commands separate unmapped reads, which represent the

pure metagenomic data, in the separate BAM files “ERR1823587_unmapped.bam”,

“ERR1823601_unmapped.bam”, and “ERR1823608_unmapped.bam”.

8.2.3.5  Creating Paired-End FASTQ Files from BAM Files

Now, we can extract the FASTQ files from the above BAM files; we will extract two FASTQ

files from each BAM file. However, before doing that, we need to sort the BAM files by read

name using the “samtools sort” command with “-n” option, which sort the paired reads to

be next to each other.

samtools sort \

-n -m 5G \

-@ 2 sam/ERR1823587_unmapped.bam \

-o sam/ERR1823587_unmapped_sorted.bam

samtools sort \

-n -m 5G \

-@ 2 sam/ERR1823601_unmapped.bam \

-o sam/ERR1823601_unmapped_sorted.bam

samtools sort \

-n -m 5G \

-@ 2 sam/ERR1823608_unmapped.bam \

-o sam/ERR1823608_unmapped_sorted.bam

Then, we create FASTQ files from the BAM files and store them in a new directory “fastq_

pure” so that we can use them in the next steps of the downstream analysis.

Mkdir fastq_pure

samtools fastq -@ 4 sam/ERR1823587_unmapped_sorted.bam \

-1 fastq_pure/ERR1823587_pure_R1.fastq.gz \

-2 fastq_pure/ERR1823587_pure_R2.fastq.gz \

-0 /dev/null -s /dev/null -n

samtools fastq -@ 4 sam/ERR1823601_unmapped_sorted.bam \

-1 fastq_pure/ERR1823601_pure_R1.fastq.gz \

-2 fastq_pure/ERR1823601_pure_R2.fastq.gz \

-0 /dev/null -s /dev/null -n

samtools fastq -@ 4 sam/ERR1823608_unmapped_sorted.bam \